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Tag Recommendations and Licenses 

■ Motivation and context 

■ Formalize aspects of privacy legislation 

Using a logic programming language 

■ Answer whether legislation/best practice permits or denies specific 
actions on data sets 

a Expert-system- 1 ike ability 

■ Explore legislation 

e.g., find conditions where best practice contradictory 

■ Combines 

■ computer science (formal modeling), 

■ law (legal research & analysis), 

■ social science (survey design), 

■ information science (taxonomies) 
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System design 




Data Owner 



Stephen Chong, Harvard University. 








Formal model: Actions 



dd : Data depositor 




r : Repository 



Deposit (dd, ds, r, cs) 



Accept (r, ds, dd, cs) 



Release (r. 




du : Data user 




dd, cs) 




cs : Condition set 

(provides further details about action) 
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Permitted or Denied 



Actions can be permitted or denied 

Permitted (leg, a) 



Denied (leg. 




leg : Legislation 




■ Or neither permitted or denied 

■ E.g v Denied ( ferpa. 

Release (harvardDataverse , 

csl52grades-2015sp, 
j on@doe . com, 
chong@seas . harvard. edu, 
[dataverseClickthrough] ) ) 
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Example formalization 

Let dd be the data depositor 
Let du be the data user 
Let ds be the data set 
Let r be the repository 
Let cs be a set of conditions 

IF CMR : depositorlnScope (dd, ds) 

AND CMR: identifiable (ds) 

AND NOT (CMR: secure (r) 

AND CMR: isAcceptableConditionsForRelease (cs 
THEN DENIED (Release (r, ds, du, dd, cs) ) 



Let 1 be a license 

Let cs be a set of conditions 

IF License (1) € cs 

AND licenselmplies (1, CMR: TransmissionEncrypted) 
THEN CMR: isAcceptableConditionsForRelease (cs) 
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Demo 
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DataTags 

■ Permitted and denied actions are the interface between 
DataTags and legislation 

■ May require more powerful language than Prolog... 



Let dd be the data depositor 
Let du be the data user 
Let ds be the data set 
Let r be the repository 
Let t be the data tag 

IF isDataTag(t, r, ds) 

AND FOR ALL condition sets cs 

PERMITTED (Release (r, ds, du, dd, cs)) 

IMPLIES conditionsRequire (cs, Reidentif icationProhibited) 
THEN atLeast (t , Yellow) 
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DataTags Questionnaire in Datalog 

■ When accepting dataset, ask depositor series of 
questions to determine DataTag 

■ Currently: nice domain specific language 

■ But imperative with explicit control flow (i.e., gotos) 

■ Goal: express more declaratively using Datalog 

■ Separate questions from control flow 

■ Facilitate composition of questionnaires 

■ Re-run old answers when questionnaire changes 
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"Optimize" question order 

■ Given declarative questionnaire, what is "best" order to ask 
questions? 

■ Fewest questions to reach decision? 

■ Ask questions from general to specific? 

■ Ask related questions at same time? 

■ Assume some cost function for the question order 

■ Characterize as a game 

■ Player asks question, opponent gives answer 

■ Player's goal: reach decision with lowest cost 

■ Determine strategy with lowest (expected) cost 

■ Game tree too big to explore exhaustively 

• E.g., with n questions, 3 answers per question, there are n! x 3 paths/final 
states 

■ But analysis of Datalog program can significantly reduce search 
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Our very own Datalog... 

■ Developed our own Datalog implementation 

■ Can extend with language features 

■ More flexible interface/efficient interaction 

■ Make use of modern concurrent hardware 

■ Will be used in Harvard undergrad PL course 

■ ... 

■ Will be released open source 
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Current state 

■ Current state: 

■ Six evaluation engines 

Top-down, bottom up, concurrent bottom up, ... 

■ Exploring different concurrent techniques to improve scalability 

Preliminary results: 1 .2-5.5x speedup over XSB Prolog on 
OpenRuleBench transitive closure tests 

■ Implemented hypotheticals 

■ Graphical user interface 

■ Suitable for use by undergrad class! 
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Moving forward 

■ Formal legal model 

■ License generation (from required conditions) 

■ Review/independent validation of rules and license text 

■ Independent validation of formalization process 

■ Engagement with practitioners 

IRBs, state and local govt, agencies, educational data controllers, ... 

■ Questionnaire representation and optimization 

■ Datalog 

■ Release and use 

■ Develop right logical extension for, e.g., connecting to 
DataTags 
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